Text-to-speech conversion with neural networks: a recurrent TDNN approach

نویسندگان

Orhan Karaali

Gerald Corrigan

Ira A. Gerson

Noel Massey

چکیده

This paper describes the design of a neural network that performs the phonetic-to-acoustic mapping in a speech synthesis system. The use of a time-domain neural network architecture limits discontinuities that occur at phone boundaries. Recurrent data input also helps smooth the output parameter tracks. Independent testing has demonstrated that the voice quality produced by this system compares favorably with speech from existing commercial text-to-speech systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Synthesis with Neural Networks

Text-to-speech conversion has traditionally been performed either by concatenating short samples of speech or by using rule-based systems to convert a phonetic representation of speech into an acoustic representation, which is then converted into speech. This paper describes a system that uses a time-delay neural network (TDNN) to perform this phonetic-to-acoustic mapping, with another neural n...

متن کامل

Speaker-independent 3D face synthesis driven by speech and text

In this study, a complete system that generates visual speech by synthesizing 3D face points has been implemented. The estimated face points drive MPEG-4 facial animation. This system is speaker independent and can be driven by audio or both audio and text. The synthesis of visual speech was realized by a codebook-based technique, which is trained with audio-visual data from a speaker. An audio...

متن کامل

Multi-State Time Delay Neural Networks for Continuous Speech Recognition

Alex Waibel Carnegie Mellon University Pittsburgh, PA 15213 [email protected] We present the "Multi-State Time Delay Neural Network" (MS-TDNN) as an extension of the TDNN to robust word recognition. Unlike most other hybrid methods. the MS-TDNN embeds an alignment search procedure into the connectionist architecture. and allows for word level supervision. The resulting system has the ability to ma...

متن کامل

Continuous Speech Phoneme Recognition Using Dynamic Artificial Neural Networks

Phoneme classification and recognition is the first step to large vocabulary continuous speech recognition. This step represents the acoustic modeling part of such a system. In hybrid speech recognition systems phoneme recognition is made by artificial neural networks (ANN’s). The main objective of this paper is the investigation of dynamic ANN’s, namely the Time-Delay Neural Networks (TDNN) an...

متن کامل

A Hybrid Stochastic Connectionist Approach to Automatic Speech Recognition

This report focuses on a hybrid approach, including stochastic and connectionist methods , for continuous speech recognition. Hidden Markov Models (HMMs) are a popular stochastic approach used for continuous speech, well suited to cope with the high variability found in natural utterances. On the other hand, artiicial neural networks (NNs) have shown high classiication power for short speech ut...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره cs.NE/9811032 شماره

صفحات -

تاریخ انتشار 1997

Text-to-speech conversion with neural networks: a recurrent TDNN approach

نویسندگان

چکیده

منابع مشابه

Speech Synthesis with Neural Networks

Speaker-independent 3D face synthesis driven by speech and text

Multi-State Time Delay Neural Networks for Continuous Speech Recognition

Continuous Speech Phoneme Recognition Using Dynamic Artificial Neural Networks

A Hybrid Stochastic Connectionist Approach to Automatic Speech Recognition

عنوان ژورنال:

اشتراک گذاری